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Strategies for searching the space of variables in combinatorial chemistry experiments are presented, 
and a random energy model of combinatorial chemistry experiments is introduced. The search 
strategies, derived by analogy with the computer modeling technique of Monte Carlo, effectively 
search the variable space even in combinatorial chemistry experiments of modest size. Efficient 
implementations of the library design and redesign strategies are feasible with current experimental 
capabilities. 
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I. INTRODUCTION 

The goal of combinatorial materials discovery is to find 
compositions of matter that maximize a specific mate- 



rial property |1 
netoresistance 



such as superconductivity [Q, mag- 
5|, luminescence ligand specifi city 

1^, sensor response [0, or catalytic activity P,pT[-p^. 
This problem can be reformulated as one of searching 
a multi-dimensional space, with the material composi- 
tion, impurity levels, and synthesis conditions as vari- 
ables. The property to be optimized, the figure of merit, 
is generally an unknown function of the variables and can 
be measured only experimentally. 

Present approaches to combinatorial library design and 
screening invariably perform a grid search in composition 
space, followed by a "steepest-ascent" maximization of 
the figure of merit. This procedure becomes inefficient in 
high-dimensional spaces or when the figure of merit is not 
a smooth function of the variables, and its use has limited 
most combinatorial chemistry experiments to ternary or 
quaternary compounds. 

In this paper, we suggest new experimental proto- 
cols for searching the space of variables in combinatorial 
chemistry, exploiting an analogy between combinatorial 
materials discovery and Monte Carlo computer model- 
ing methods. In Section II we discuss several of these 
strategies for library design and redesign. In Section III 
we introduce the Random Phase Volume Model that we 
will use to compare the different methods. The effective- 
ness of different strategies is discussed in Section IV. We 
conclude in Section V. 



II. SAMPLING THE SPACE OF VARIABLES IN 
MATERIALS DISCOVERY 

Several variables can be manipulated in order to seek 
the material with the optimal figure of merit. Material 
composition is certainly a variable. But also, film thick- 
ness [0 and deposition method jl^] are variables for 
materials made in thin film form. The processing his- 
tory, such as temperature, pressure, pH, and atmospheric 
composition, is a variable. The guest composition or im- 



purity level can greatly affect the figure of merit |jT^. In 
addition, the "crystallinity" of the material can affect the 
observed figure of merit ||l^ . Finally, the method of nu- 
cleation or synthesis may affect the phase or morphology 
of the material and so affect the figure of merit ||l9| . 

We assume that the composition and non-composition 
variables of each sample can be changed independently 
Then, instead of a grid search on the composition 
and non-composition variables, we consider choosing the 
variables at random from the allowed values. We also 
consider choosing the variables in a fashion that attempts 
to maximize the amount of information gained from the 
limited number of samples screened, via a quasi-random, 
low-discrepancy sequence pO| . 

We further consider performing multiple rounds of 
screening, incorporating feedback as the experiment pro- 
ceeds by treating the combinatorial chemistry experiment 
as a Monte Carlo in the laboratory. This leads to sam- 
pling the experimental figure of merit, E, proportional 
to exp(/3i?). If /3 is large, then the Monte Carlo proce- 
dure will seek out values of the composition and non- 
composition variables that maximize the figure of merit. 
If (3 is too large, however, the Monte Carlo procedure 
will get stuck in relatively low-lying local maxima. The 
first round is initiated by choosing the composition and 
non-composition variables at random from the allowed 
values. The variables are changed in succeeding rounds 
as dictated by the Monte Carlo procedure. 

Two ways of changing the variables are considered: 
randomly changing the variables of a randomly chosen 
sample a small amount and exchanging a subset of the 
variables between two randomly chosen samples. These 
moves are repeated until all the samples in a round have 
been modified. The values of the figure of merit for the 
proposed new samples are then measured. Whether to 
accept the newly proposed samples or to keep the current 
samples for the next round is decided according to the 
detailed balance acceptance criterion. For the random 
change of one sample, we find the Metropolis acceptance 
probability: 



acc(c p) ~ min {1, exp [(3 {Ep 



Er: 



t)]} 



(1) 
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Proposed samples that increase the figure of merit are al- 
ways accepted; proposed samples that decrease the figure 
of merit are accepted with the Metropolis probability. Al- 
lowing the figure of merit occasionally to decrease is what 
allows samples to escape from local maxima. The random 
displacement of the d mole fraction variables, Xi, is done 
in the (d — l)-dimensional subspace orthogonal to the d- 
dimensional vector This procedure ensures 

that the constraint X^iLi Xi — 1 is maintained. This 
subspace is identified by the Gram-Schmidt procedure. 
Moves that violate the constraint Xi > are rejected. 
Moves that lead to invalid values of the non-composition 
variables are also rejected. For the swapping move ap- 
plied to samples i and j, we find the modified acceptance 
probability: 



acc(c —> p) = min < 1, exp 



P { -^proposed ~^ ^proposed 



current current 



(2) 



Fig. ^ shows one round of a Monte Carlo procedure. 
The parameter /3 is not related to the thermodynamic 
temperature of the experiment and should be optimized 
for best efficiency. The characteristic sizes of the random 
changes in the composition and non-composition vari- 
ables are also parameters that should be optimized. 

If the number of composition and non-composition 
variables is too great, or if the figure of merit changes 
with the variables in a too-rough fashion, normal Monte 
Carlo will not achieve effective sampling. Parallel tem- 
pering is a natural extension of Monte Carlo that is used 
to study statistical |^ , spin glass , and molecular Q 
systems with rugged energy landscapes. Our most power- 
ful protocol incorporates the method of parallel temper- 
ing for changing the system variables. In parallel tem- 
pering, a fraction of the samples are updated by Monte 
Carlo with parameter (3i , a fraction by Monte Carlo with 
parameter and so on. At the end of each round, sam- 
ples are randomly exchanged between the groups with 
different /3's, as shown in Fig. |l|b. The acceptance prob- 
ability for exchanging two samples is 



acc(c p) = min {1, exp [Af3AE]} 



(3) 



where A/3 is the difference in the values of (3 between 
the two groups, and AE is the difference in the figures 
of merit between the two samples. It is important to 
notice that this exchange step does not involve any ex- 
tra screening compared to Monte Carlo and is, therefore, 
"free" in terms of experimental costs. This step is, how- 
ever, dramatically effective at facilitating the protocol to 
escape from local maxima. The number of different sys- 
tems and the temperatures of each system are parameters 
that must be optimized. 

To summarize, the first round of combinatorial chem- 
istry consists of the following steps: constructing the ini- 



tial library of samples, measuring the initial figures of 
merit, changing the variables of each sample a small ran- 
dom amount or swapping subsets of the variables between 
pairs of samples, constructing the proposed new library 
of samples, measuring the figures of merit of the pro- 
posed new samples, accepting or rejecting each of the pro- 
posed new samples, and performing parallel tempering 
exchanges. Following rounds of combinatorial chemistry 
repeat these steps, starting with making changes to the 
current values of the composition and non-composition 
variables. These steps are repeated for as many rounds 
as desired, or until maximal figures of merit are found. 

We have chosen to sample the figure of merit by Monte 
Carlo, rather than to optimize it globally by some other 
method, for several reasons. First, Monte Carlo is an 
effective stochastic optimization method. Second, sim- 
ple global optimization may be misleading since concerns 
such as patentability, cost of materials, and ease of syn- 
thesis are not usually included in the experimental figure 
of merit. Moreover, the screen that is most easily per- 
formed in the laboratory, the "primary screen," is usually 
only roughly correlated with the true figure of merit. In- 
deed, after finding materials that look promising based 
upon the primary screen, experimental secondary and 
tertiary screens are usually performed to identify that 
material which is truly optimal. Third, it might be ad- 
vantageous to screen for several figures of merit at once. 
For all of these reasons, sampling by Monte Carlo to pro- 
duce several candidate materials is preferred over global 
optimization. 



III. THE RANDOM PHASE VOLUME MODEL 

The effectiveness of these protocols is demonstrated 
by combinatorial chemistry experiments as simulated by 
the Random Phase Volume Model. The Random Phase 
Volume Model is not fundamental to the protocols; it is 
introduced as a simple way to test, parameterize, and 
validate the various searching methods. The model re- 
lates the figure of merit to the composition and non- 
composition variables in a statistical way. The model 
is fast enough to allow for validation of the proposed 
searching methods on an enormous number of samples, 
yet possesses the correct statistics for the figure-of-merit 
landscape. The d-dimensional vector of composition mole 
fractions is denoted by x. The composition mole fractions 
are non-negative and sum to unity, and so the allowed 
compositions are constrained to lie within a simplex in d 
dimensions. For the familiar ternary system, this simplex 
is an equilateral triangle. The composition variables are 
grouped into phases centered around points Xq, ran- 
domly placed within the allowed composition range (the 
phases form a Voronoi diagram ||2^, see Fig. The 
model is defined for any number of composition variables, 
and the number of phase points is defined by requiring 
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the average spacing between phase points to be ^ = 0.25. 
To avoid edge effects, additional points are added in a 
belt of width 2^ around the simplex of allowed composi- 
tions. The figure of merit should change dramatically be- 
tween composition phases. Moreover, within each phase 
a, the figure of merit should also vary with y = x — 
due to crystallinity effects such as crystallite size, inter- 
growths, defects, and faulting [0. In addition, the non- 
composition variables should also affect the measured fig- 
ure of merit. The non-composition variables are denoted 
by the 6-dimensional vector z, with each component con- 
strained to fall within the range [—1,1] without loss of 
generality. There can be any number of non-composition 
variables. The figure of merit depends on the composi- 
tion and non-composition variables in a correlated fash- 
ion, and so the non-composition variables also fall within 
Nz "z-phases" defined in the space of composition vari- 
ables. There are a factor of 10 fewer non-composition 
phases than composition phases. The functional form of 
the model when x is in composition phase a and non- 
composition-phase 7 is 

^(x, z) = 

q d 



fe=l 



ii>...>ifc = l 
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/c=l 



b 

E 

Ji >...>ifc=i 



(4) 



where fi-^...ik is a constant symmetry factor, and are 
constant scale factors, and W-y, ^f"'^', , and bI'''^] 
are random Gaussian variables with unit variance. In 
more detail, the symmetry factor is given by 

fc! 

fn-.-ik = —I - , (5) 

where I is the number of distinct integer values in the set 
{ii, . . . ,ik}, aud Oi is the number of times that distinct 
value i is repeated in the set. Note that 1 < I < k and 
Z]i=i '^i — ^- The scale factors are chosen so that each 
term in the multinomial contributes roughly the same 
amount: = C/2 and = {{z^) / (z^))^/^ = (3/7)'/*. 
The ax and cr^ are chosen so that the multinomial, crys- 
tallinity terms contribute 40% as much as the constant, 
phase terms on average. For both multinomials g = 6. 
As Fig. § shows, the Random Phase Volume Model de- 
scribes a rugged figure of merit landscape, with subtle 
variations, local maxima, and discontinuous boundaries. 



IV. RESULTS 

Six different search protocols are tested with increasing 
numbers of composition and non-composition variables. 



The total number of samples whose figure of merit will 
be measured is fixed at M = 100, 000, so that all pro- 
tocols have the same experimental cost. The single pass 
protocols Grid, Random, and LDS are considered. For 
the Grid method, we define = M and 
Mz — A/''/*^'^^^+''). The grid spacing of the composition 
variables is (^x = iVd/Mx)^^'"'^^^\ where 

is the volume of the allowed composition simplex. Note 
that the distance from the centroid of the simplex to the 
closest point on the boundary of the simplex is 

Rd = 77o ■ (''') 

[did-l)f' 

The spacing for each component of the non-composition 
variables is = 2/m}J^. For the LDS method, differ- 
ent quasi-random sequences are used for the composition 
and non-composition variables. The feedback protocols 
Monte Carlo, Monte Carlo with swap, and Parallel Tem- 
pering are considered. The Monte Carlo parameters were 
optimized on test cases. It was optimal to perform 100 
rounds of 1,000 samples with /3 = 2 for d = 3 and [3 — 1 
for d = 4 or 5, and Ax = 0.1i?d and = 0.12 for 
the maximum random displacement in each component. 
The swapping move consisted of an attempt to swap all 
of the non-composition values between the two chosen 
samples, and it was optimal to use -Pswap 

~ 0.1 for the 

probability of a swap versus a regular random displace- 
ment. For Parallel Tempering it was optimal to perform 
100 rounds with 1,000 samples, divided into three sub- 
sets: 50 samples at /3i = 50, 500 samples at (32 = 10, 
and 450 samples at /33 = 1. The 50 samples at large (3 
essentially perform a "steepest-ascent" optimization and 
have smaller Aa; = 0.01i?d and Az = 0.012. 

The figures of merit found by the protocols are shown 
in Fig. H The Random and LDS protocols find better so- 
lutions than does Grid in one round of experiment. More 
importantly, the Monte Carlo methods have a tremen- 
dous advantage over one pass methods, especially as the 
number of variables increases, with Parallel Tempering 
the best method. The Monte Carlo methods, in essence, 
gather more information about how best to search the 
variable space with each succeeding round. This feedback 
mechanism proves to be effective even for the relatively 
small total sample size of 100,000 considered here. We 
expect that the advantage of the Monte Carlo methods 
will become even greater for larger sample sizes. Note 
that in cases such as catalytic activity, sensor response, 
or ligand specificity , the experimental figure of merit 
would likely be exponential in the values shown in Fig. 
so that the success of the Monte Carlo methods would be 
even more dramatic. A better calibration of the param- 
eters in Eq. ^ may be possible as more data becomes 
available in the literature. 
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V. CONCLUSION 

To conclude, the experimental challenges in combina- 
torial chemistry appear to lie mainly in the screening 
methods and in the technology for the creation of the 
libraries. The theoretical challenges, on the other hand, 
appear to lie mainly in the library design and redesign 
strategies. We have addressed this second question via 
an analogy with Monte Carlo computer simulation, and 
we have introduced the Random Phase Volume Model to 
compare various strategies. We find the multiple-round, 
Monte Carlo protocols to be especially effective on the 
more difficult systems with larger numbers of composi- 
tion and non-composition variables. 

An efficient implementation of the search strategy is 
feasible with existing library creation technology. More- 
over "closing the loop" between library design and re- 
design is achievable with the same database technology 
currently used to track and record the data from com- 
binatorial chemistry experiments. These multiple-round 
protocols, when combined with appropriate robotic con- 
trols, should allow the practical application of combinato- 
rial chemistry to more complex and interesting systems. 
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FIG. 1. a) One Monte Carlo round with 10 samples, b) 
One Parallel Tempering round with 5 samples at /3i and 5 
samples at (32- 
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FIG. 2. The Random Phase Volume Model. The model 
is shown for the case of three composition variables and one 

non-composition variable. The boundaries of the x phases arc 
evident by the sharp discontinuities in the figure of merit. To 
generate this figure, the z variable was held constant. The 
boundaries of the z phases axe shown as thin dark lines. 
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FIG. 3. The maximum figure of merit found with different 
protocols on systems with different number of composition 
(x) and non-composition (z) variables. The results are scaled 
to the maximum found by the Grid searching method. Each 
value is averaged over scaled results on 10 different instances 
of the Random Phase Volume Model with different random 
phases. The Monte Carlo methods are especially effective 
on the systems with larger number of variables, where the 
maximal figures of merit are more difficult to locate. 
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